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A NEW METHOD OF NORMAL APPROXIMATION^ 

By Sourav Chatterjee 
University of California, Berkeley 

We introduce a new version of Stein's method that reduces a large 
class of normal approximation problems to variance bounding exer- 
cises, thus making a connection between central limit theorems and 
concentration of measure. Unlike Skorokhod embeddings, the object 
whose variance must be bounded has an explicit formula that makes 
it possible to carry out the program more easily. As an application, 
we derive a general CLT for functions that are obtained as combi- 
nations of many local contributions, where the definition of "local" 
itself depends on the data. Several examples are given, including the 
solution to a nearest-neighbor CLT problem posed by P. Bickel. 

1. Introduction. Central limit theorems for general nonadditive func- 
tions of independent random variables have been studied by various au- 
thors using a variety of techniques. Some examples are: (i) the method of 
Hajek projections and some sophisticated extensions (e.g., [20, 38, 43]); (ii) 
Stein's method of normal approximation (references in Section 2); (iii) the 
big-blocks-small-blocks technique and its modern multidimensional versions 
(e.g. [2, 6]); (iv) the martingale approach and Skorokhod embeddings; (v) 
the method of moments. In this paper, we present a new approach that may 
go beyond the limitations of these existing techniques. The power of the 
method is demonstrated through several applications, mainly geometrical in 
nature, that are otherwise difficult. In the related article [10], we provide 
some applications to random matrices. 

The paper is organized as follows. Section 2 contains a brief discussion of 
Stein's method and our main results (Theorems 2.2 and 2.5). Examples are 
worked out in Section 3. Proofs of the main theorems are in Section 4. 



Received November 2006; revised July 2007. 

^Supported in part by NSF Grant DMS-07-07054 and a Sloan Research Fellowship in 
Mathematics. 

AMS 2000 subject classifications. 60F05, 60B10, 60D05. 

Key words and phrases. Normal approximation, central limit theorem. Stein's method, 
nearest neighbors, coverage processes, quadratic forms, occupancy problems. 

This is an electronic reprint of the original article published by the 
Institute of Mathematical Statistics in The Annals of Probability, 
2008, Vol. 36, No. 4, 1584-1610. This reprint differs from the original in 
pagination and typographic detail. 



1 



2 



S. CHATTER JEE 



2. Results. Recall that the Kantorovich-Wasserstein distance between 
two probability measures /i and v on the real line is defined as 

:h Lipschitz, with ||/i||Lip < 1 \. 

Convergence of measures in the Kantorovich-Wasserstein metric is stronger 
than weak convergence. Based on the Kantorovich-Wasserstein distance, we 
introduce the following measure of "distance to Gaussianity." 

Definition 2.1. Let be a real- valued random variable with finite 
second moment. Let /i be the law of {W — E(VF))/\/Var(VF) and let v be 
the standard Gaussian law. We define 

bw :=>V(/i,z^). 

This is our preferred metric of normal approximation in this paper. Using 
the bounds on the Wasserstein distance, analogous results can be obtained 
for the Kolmogorov distance via smoothing, but the rates will be suboptimal. 
This problem is very common in Stein's method; obtaining optimal rates for 
the Kolmogorov metric requires extra work and new ideas (see, e.g., [13]). 
Since our main focus is on convergence to normality and not so much on 
error bounds, we will not worry about this issue here. 

2.1. Stein' s method. A well-known computation via integration by parts 
shows that if Z ~ Af(0, 1), then E((^(Z)Z) = for ah absolutely con- 

tinuous Lp with E|99'(Z)| < cx). Conversely, if is a random variable satis- 
fying E((/j(W^)W^) = E(/(T^)) for ah Lipschitz y?, then W^~iV(0,l). Con- 
sequently, if W is such that 'E.{Lp{W)W) E(99'(VF)) for ah belonging to 
a large class of functions, then one can expect the distribution of W to be 
close to the A^(0, 1) distribution. 

This is the key idea behind Stein's method of normal approximation, 
introduced by Charles Stein in the seminal paper [40] and later developed 
in his book [41]. Precise error bounds can be obtained in various ways. We 
will reproduce one of Stein's results (Lemma 4.2) that gives a bound on the 
Kantorovich-Wasserstein distance to normality. 

However, the problem begins at this point. Given a random variable W 
that may be a complicated function of many other variables, there is no 
general method for showing that E(99(VF)VF) wE(99'(P^)). Several powerful 
techniques for carrying out this step under special conditions are available 
in the literature on Stein's method (e.g., exchangeable pairs [41], diffusion 
generators [5], dependency graphs [4, 13, 36], size bias transforms [23], zero 
bias transforms [22], specialized procedures like [21, 34, 35] and recent de- 
velopments [11, 12], to cite a few), but, somehow, they all require something 
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"nice" to happen. It is rarely the case that something arbitrary (hke the 
Levina-Bickel statistic [30], to be discussed later) becomes amenable to any 
of the existing versions of Stein's method. 

This is the ground that we attempt to break in this paper. Given a ran- 
dom variable W that is an explicit but arbitrary function of a collection 
of independent random variables, satisfying E(VF) = and E(Vl^^) = 1, we 
prescribe a method of constructing another random variable T so that for 
all smooth (p, we have 

K{ip{W)W)^E{ip'{W)T). 

In particular, taking ip to be the identity function, we get E(T) ~ 1. If, now, 
Var(r) is small enough, then we can make the easy but crucial deduction 
that T "can be substituted by the constant E(T)" to get E{ip{W)W) ^ 
'E{(p'{W)), which shows that the distribution of W is approximately stan- 
dard Gaussian. Thus, the normal approximation problem is reduced to the 
problem of bounding the variance of T. Of course, the crux of the matter 
lies in the construction of T, which we undertake below. 

2.2. An abstract result. Let X he a measure space and suppose X = 
{Xi, . . . , Xn) is a vector of independent A'- valued random variables. Let X' = 
(X'l, . . . , X'^) be an independent copy of X. Let [n] = {1, . . . , n}, and for each 
AC [n], define the random vector X^ as 

xA^fx'i, Hie A, 

' \Xi, ifi^A. 

When A is a singleton set like {j}, we will simply write X^ instead of X^^\ 
Let f :X ^Mhe a measurable function. We define a randomized derivative 
of f{X) along the jth coordinate as 

Ajf{X):=fiX)-f{X^). 

Note that Ajf{X) depends not only on the vector X, but also on Xj. Next, 
for each A CI [n], let 

r^:=5:A,/(X)A,/(X^) 

and let 

(1) T=- T — 

Putting W = f{X) and assuming that E(l^) = 0, we show (Lemma 2.4) 
that whenever Y^j^^jfi^W is smah, we have E{if{W)W)^E{ip'{W)T) 
for all if belonging to a large class of functions. The main consequence is the 
following normal approximation theorem, which is our main abstract result. 
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Theorem 2.2. Let all terms be defined as above and let W = f{X). 
Suppose that E{W) = and a"^ := E(VF2) < oo. Then, E(r) = cr^ and 

where 5w is the distance to normality defined in Definition 2.1. 

For the simplest possible application of Theorem 2.2, let Xi, . . . ,Xn be 
i.i.d. real-valued random variables with = and E(X?) = 1, and let 

W = f{X) := J2lLi Xi. Then, for any A C [n] and j i A, 

A,/(X^) = n-i/2(X,-Xj). 

Thus, Ta = ^j^A^Xj — X'^)"^ . A simple verification now shows that 

Now, assuming that ¥.{Xf) < oo and using the inequality Var(E(T|VF)) < 
Var(T), we see that 5w < Cn"^/^ for some constant C that depends only 
on the distribution of the Xi's. 

A shortcoming of Theorem 2.2 is that it does not say anything about the 
variance o"^. Somewhat mysteriously, we get a normal approximation result 
without having to evaluate the variance of our statistic. Of course, the error 
bound depends on o"^ and to show that the bound is useful, we require a 
lower bound on a^. We prefer to think of that as a separate problem. 

The proof of Theorem 2.2, and indeed our whole technique, rests on 
the following "local-to-global" lemma that deserves to be mentioned its 
own right. It is closely connected to certain techniques introduced in the 
author's previous works [8, 9]. 

Lemma 2.3. For any g, / : A"" ^ M such that E£?(X)2 and Kf{Xf are 
both finite, we have 

Cov{giX)JiX)) = l (n.f^ J2nAMX)A,fiX^)]. 

"^AC[n] y\A\)Kn-\A\) .^^ 

A consequence of the above lemma, to be proven in Section 4, is the 
following result, which shows that E{Wip{W)) KiE{(p' {W)T) for all "nice" 
ip, whenever XiLi E| Aj/(X)|3 is small. 

Lemma 2.4. Let W = f{X) and suppose that E(VF) = and ¥.{W'^) = 1. 
Then, for any ip G C^(M) with bounded second derivative, we have 

II //I I n 

\nm)w) - n^'{w)T)\ < ^e|a,/(x)|3. 
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This lemma has a connection with the Goldstein-Reinert zero-bias trans- 
form version of Stein's method [22], which we now explain. Given a random 
variable W with mean zero and unit variance, a random variable W* is said 
to be a zero-bias transform of W if for all absolutely continuous (p, 

E{Wip{W)) = E{ip'{W*)) 

whenever both sides are well defined. It is shown in [22] that a zero-bias 
transform always exists and the closeness to normality for W can be mea- 
sured by the closeness of the distributions of W and W* (which is usually 
done by constructing W* such that W ~ W*). The problem with this ap- 
proach, again, is that zero-bias transforms are hard to construct in gen- 
eral. Lemma 2.4 tells us that whenever ^"^^ E| Aj/(X)|^ is small, we have 
K{Wip{W)) ^K{ip'{W)E{T\W)). This means, roughly, that E(T\W) is ap- 
proximately the Radon-Nikodym density of the law of W* with respect to 
the law of W, although such a density may not actually exist. Incidentally, 
such densities have been studied before, for example, in [7]. 

2.3. A general CLT for structures with local dependence. Numerous cen- 
tral limit theorems in probability theory have been conjectured or proven by 
following the intuition that a CLT for a sum of dependent summands should 
hold if "the dependencies are local in nature." Some notable examples are 
the classical big-blocks-small-blocks technique for analyzing m-dependent 
sequences, its multidimensional generalizations (e.g., [2, 6]), and the depen- 
dency graph method of [3]. Here, we provide a new method that is seemingly 
more powerful than the existing techniques (our applications provide some 
evidence for this claim) and also gives explicit error bounds. The method is 
derived as a nontrivial corollary of Theorem 2.2. 

Let X he sl measure space and suppose / : A'" — > M is a measurable map, 
where n > 1 is a fixed positive integer. Suppose G is a map which associates 
to every x € A"" an undirected graph G{x) on [n] := {1, . . . ,n}. Such a map 
will be called a graphical rule on X"'. We will say that a graphical rule G 
is symmetric if for any permutation vr of [n] and any (xi, . . . ,Xn) € X^, the 
set of edges in G(x^(i), . . . is exactly 

{{7r(i), 7r(j)} : {i,j} G G(xi, . . . , 

Now, fix m > n. We say that a vector x G A'" is embedded in another vector 
y G X"^ if there exist distinct ii, . . . ,in G [m] with Xk = yif. for 1 < k < n. 
A graphical rule G' on X"^ will be called an extension of G if for any 
X G X^ embedded in y G Af™, the graph G{x) on [n] is the naturally induced 
subgraph of the graph G'{y) on [m]. 

Now, take any x,x' G X^. For each i G [n], let be the vector obtained 
by replacing Xi with in the vector x. For any two distinct elements i 
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and j of [n], let x^^ be the vector obtained by replacing Xj with x'^ and Xj 
with x'j. We say that the coordinates i and j are noninteracting under the 
triple (/, if 

f{x)-f{x^) = f{x')-f{x'^). 

Note that the definition is symmetric in i and j. This is just a discrete analog 
of the condition 



dxi dxj 



(x) = 0, 



which clarifies why it is reasonable to define interaction between coordinates 
in this manner. 

We will say that a graphical rule G is an interaction rule for a function / 
if for any choice of x,x' and the event that {i,j} is not an edge in the 
graphs G{x), G{x^), G{x^) and G{x^^) implies that i and j are noninteract- 
ing vertices under (/, Again, in a continuous setup, we would simply 
declare that G{x) is the graph that puts an edge between i and j if and only 
if 

^ (x)/0. 



dxi dxj 



Clearly, this is a naturally acceptable definition of an interaction rule (or 
interaction graph) for /. Since we do not want to confine ourselves to the 
continuous case, the definitions become a bit more complex. 



Theorem 2.5. Let f-.X"" -^M. be a measurable map that admits a sym- 
metric interaction rule G. Let Xi,X2,... be a sequence of i.i.d. X -valued 
random variables and let X = {Xi, . . . , Let W = f{X) and = Var(VF). 
Let X' = {X[, . . . , X'j^) be an independent copy of X . For each j, define 

AjfiX) = W- f{Xi, Xj, Xj+i, . . . , X„) 

and let M = maxj |Aj/(X)|. Let G' be an arbitrary symmetric extension of 
G on ^"+4 and put 

6 := 1 + degree of the vertex 1 in G'{Xi, . . . , X„+4). 

We then have 

Sw < ^e(m8)V4e(54)V4 + ^ 5:e|a,/(x)|3, 

where 5w is the distance to normality defined in Definition 2.1 and C is a 
universal constant. 
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3. Examples. This section is devoted to working out applications of The- 
orems 2.2 and 2.5. Some of these are new results, while others are simpler 
proofs of existing results. In general, we do not investigate whether our con- 
vergence rates are optimal, but in examples where the answers are known, 
our rates match the existing ones. References to the relevant literature are 
given in the appropriate places. 

3.1. Quadratic forms. Suppose Xi, . . . ,Xn are i.i.d. real-valued random 
variables with zero mean, unit variance and finite fourth moment. Let A = 
iO'ij)i<i,j<n be a real symmetric matrix. We consider the following question: 
under what conditions on the matrix A can we say that the quadratic form 
W = J2i<j CLijXiXj is approximately Gaussian? 

The answer to this question is not very simple; for instance, the usual 
methods for U-statistics do not work for this problem. The best known 
condition in the literature (see, e.g., Rotar [37], Hall [27], de Jong [15]) says 
that asymptotic normality holds if we have a sequence of symmetric matrices 
An satisfying 

n 

(2) lim (7~^ Tr(A^) = and lim cr~^ max ,•,• = 0, 

where a\ = \ Tr(A^) = Var(VFn). The first condition may seem strange, but 
it is actually equivalent to 

E(T^„-E(VF„,))4 ^ 

lim ; — ; = 3, 

n^cxD (Var(VFn))2 

which is a necessary condition for convergence to normality if the sequence 
{^n}n>i is uniformly integrable. The best error bounds were obtained by 
Gotze and Tikhomirov [24, 25]. 

It is possible to deal with this problem quite easily using our method. 
Since this is meant to be only an illustration, we keep the expressions as 
simple as possible by letting the Xj's be ±1 Rademacher random variables. 

Proposition 3.1. Lei X = {X\, . . . ,Xn) be a vector of i.i.d. random 
variables with ¥{Xi = 1) = ¥{Xi = —1) = 1/2. Let A = {aij)Kij<n be a real 
symmetric matrix. Let W = J2i<j O'ij^i^j o^i^d o"^ = Var(M^) = iTr(A^). 
Then, 

where 5w is the distance to normality defined in Definition 2.1. 
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Note that the classical condition (2) is implied by the above result, because 

3/2 / n \ 1/2 



E(E4) <2a^max(^X:4] 



Proof of Proposition 3.1. We will freely use the notation from The- 
orem 2.2 in this proof. Without loss of generality, we can replace ajj by aij/a 
and assume that Tr(A^) = J^ij^lj = 2. Again, since E(Ty) = J2i'=i^ii^ '^^ 
can assume that an = for all i after subtracting the mean. Then, note that 
for any A C [n] and i ^ A, 

AJ{X^) = (Xi - X',) ( ^ a,,X, + ^ a^.X'A . 

\jiA j£A / 



Thus, we have 



E(Ai/(X)A,/(X^)|X) 

= e({X,- Xlf (j2 f ^ a,,X, + ^ a,,Xj 



\j=i 



X 



2iJ2 aijXj QijXj =2 J2 aijQikXjXk. 



\j=i 



\jiA 



je[ri]\A,fcg[n] 



A simple verification now shows that 



E(r|A)= a^jaikXjxJ ^ 

l<i,j,k<n \^C[n]\-| 



-A^A^A, 
2 



where A* stands for the transpose of the column vector A. Let bij denote 
the {i,j)th element of A^. Since Xf = 1, the above identity shows that 

Var(E(r| A)) = Var ( ^ 6., A,A,- ] = ^ < i Tv{A^). 

\i<j ) i<j 



Finally, by Khintchine's inequality [26], we get 



E|Ai/(A)|3 = 4E 



n 






'HP') 







3/2 



The proof is now completed by using the above bounds in Theorem 2.2. □ 
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3.2. An occupancy problem. Suppose n balls are dropped into an boxes 
such that all (an)" possibilities are equally likely. Let W be the number 
of empty boxes. The distribution of W is completely known from elemen- 
tary probability (see, e.g., Feller [19], Section IV.2; for extensive references, 
see [16]). Very general error bounds for the normal approximation of random 
variables like W are also known [18]. For illustrative purposes, we now apply 
Theorem 2.5 to prove a CLT for W when a remains fixed and n tends to 
infinity. 

Proposition 3.2. Let W be the number of empty boxes as above. Then, 



n 



where 6w is the distance to normality defined in Definition 2.1, f{a) = 
(ae~^l'^ — (1 + 0)6"^/°^)"^/^ and C is a universal constant. 

Remark. This matches the sharp convergence rate obtained in [18], 
although that result is for the Kolmogorov distance. 

Proof of Proposition 3.2. In the following discussion, we are going 
to freely use the terms defined in the statement of Theorem 2.5 without 
explicit mention. Let X be the set of labels of the an boxes and let Xi 
denote the label of the box into which ball i is dropped. Let X = {Xi , . . . , Xn) 
and let W = f{X) denote the number of empty boxes in the configuration 
X. Then, the transformation X X^ denotes the action of moving the 
ball j from its current box to a box chosen uniformly at random. Clearly, 
]Aj/(X)] < 1 always and therefore M < 1, where M = maxj \Ajf{X)\ as 
defined in Theorem 2.5. 

Let us now define an interaction graph for this problem. Given a config- 
uration X, let G{x) be the graph on [n] that puts an edge between i and j 
if and only if Xi = Xj, that is, the balls i and j land in the same box in the 
configuration x. It is easy to see that G is symmetric. Let us show that G 
is indeed an interaction graph for / according to our definition. 

Let x' be another configuration and let x*, x^ and x*-' be defined as usual. 
Suppose {i,j} is not an edge in G{x), G(a;*), G{x^) and G(x*-'). This means 
that the balls i and j are in different boxes in all four configurations. Now, 
f{x) — f{x^) depends only on the number of balls other than ball j in the 
boxes Xj and x'j. Thus, f{x) — f{x^) = f{x^) — /(x*-'). This proves that G is 
an interaction graph for /. 

Now, define G' on A'""'"^ in exactly the same way as we defined G on X^, 
that is, given x G X"'^'^, G'(x) puts an edge between i and j if and only if 
Xi = Xj. Again, it is trivial to check that G' is symmetric and that G' is an 
extension of G. 
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We now see that by the definition in Theorem 2.5, 6 has the distribution 
of the number of bahs in a typical box when we drop n + 4 balls into an 
boxes. Clearly, E{6^) < Ca'^ for some constant C that does not depend 
on n. Finally, it is easy to check that cr^ ~ {ae~^^" — (1 + a)e~^/")n as 
n — > oo. The proof is now easy to complete using Theorem 2.5. □ 

3.3. Coverage processes. Broadly speaking a stochastic coverage process 
is a random collection of (possibly overlapping) subsets of a metric space. 
The classic reference for the general theory of coverage processes is the book 
by Hall [28] (see also Chapter H in Aldous [1]). 

We consider the following type of coverage process. Let {X,p) be a sep- 
arable metric space endowed with a measure A (think of Euclidean space 
with Lebesgue measure) and suppose Xi,...,Xn are i.i.d. random points 
on X drawn according to some probability measure on X. Fix some e > 
and let TZ be the random region covered by closed balls of radius e centered 
at Xi, . . . ,Xn (our coverage process). Formally, if B{u,e) denotes the closed 
ball of radius e centered at u, then 

n 

(3) n=[jB{Xi,e). 

i=l 

We will prove a general CLT for the area X{TZ). Of course, a large body 
of literature on this question already exists, but it is almost exclusively for 
processes on Euclidean spaces, where the analysis can be done by the big- 
blocks-small-blocks technique. The arguments are geometric in nature and 
do not extend to arbitrary metric spaces (e.g., manifolds). Moreover, the 
literature is silent on error bounds. For a discussion of the existing results 
and references, we refer to Section 3.4 of [28] (Theorem 3.5, in particular) 
and the notes at the end of Chapter 3 in the same book. 

Here, we give a general normal approximation result with an error bound 
for the problem mentioned above. It comes as a very easy corollary of The- 
orem 2.5, possibly admitting extensions to more complex normal approxi- 
mation problems in this area. 

Proposition 3.3. Suppose we have n i.i.d. points Xi, . . . ,Xn on a sep- 
arable metric space {X,p) endowed with a nonnegative Borel measure A. 
Given e > 0, define the set TZ as in (3). Put = sup„g_:t. A(5(u, e)) and 
Pe = P(/9(Xi , X2) < 2e) . LetW = A(7^) and = Var (VF) . Then, 

Cn^/^M^{l + npe) uMl 



6w < S ^ 



where 5w is the distance to normality defined in Definition 2.1 and C is a 
universal constant. 
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A bound like the above conveys no meaning unless applied to a concrete 
example. The simplest such example is the following. Let X be the unit 
square in M? and e = n~^l'^ . Clearly, < Cin~^ and Pe < C2n~^ for some 
constants Ci and C2 that do not depend on n. It can be shown (see [28], 
Theorem 3.4) that we also have a1 > C^n"^ for some positive constant C3 
free of n. Plugging these estimates into the above bound, we get 6]y < 
(j^-i/2 ^ Note that in this specific example, we may not get asymptotic 
normality if e decays faster than n""*^/^ as n — > cx). 

Proof of Proposition 3.3. Given x e X"-, let f{x) = X{TZ{x)) and 
let G{x) be the graph on [n] that puts an edge between i and j if and only 
if p{xi,Xj) < 2e. Let us verify that G is an interaction rule for /. 

Take any x,x' € X'^ and let x', x^ and x*-' be defined as in the beginning 
of Section 2.3. Let Nj{x) be the set of neighbors of x in the graph G{x). 
Then, f{x) - f{x^) = \{A) - X{B), where 

A = B{x'j,e)\ U I3{xe,e) and B = B{xj,e)\ [j B{xe,e). 

Now, if is not an edge in G{x), G{x^), G(x*) and G{x'^^), then it 

is easy to see that Nj{x) = Nj{x^) and Nj{x^) = Nj{x^^). It follows that 
f{x) = /(x*) and /(x-^) = /(x*-'). Thus, G is an interaction rule for /. The 
expression for /(x) — f{x^) also shows that |/(x) — f{x^)\ is always bounded 
by the constant Mg. 

Next, given xi, . . . ,x„+4 £ X, let G' be defined in exactly the same way 
that G was defined, that is, put include the edge {i,j} if and only if p{xi,Xj) < 
2e. It is trivial to see that G' is an extension of G in the sense defined in Sec- 
tion 2.3. Thus, if 6 is defined as in Theorem 2.5, then 5 — 1 ~ Binomial(n + 
3,^^), where pe = ¥{p{Xi,X2) < 2e). An application of Theorem 2.5 com- 
pletes the proof. □ 

3.4. A CLT for nearest-neighbor statistics. In a well-known 1983 paper, 
Bickel and Breiman [6] proved a central limit theorem for functionals of the 
form 

1 " 

(4) —Y,{KXt,D,)-m{X,,D,)), 

V™ £=1 

where Xi, . . . ,Xn are i.i.d. random vectors following a probability density 
that is bounded and continuous on its support, Di := miuj^^ — Xj\\ 
is the distance between X^ and its nearest neighbor and /i is a uniformly 
bounded and a.e. continuous function. Although the result looks very plau- 
sible, the proof is daunting. Indeed, as the authors put it, "Our proof is 
long. We believe that this is due to the complexity of the problem.^^ In short. 
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their method can be described as a difficult multidimensional generalization 
of the familiar big-blocks-small-blocks method for analyzing m-dependent 
sequences. 

Note that the existence of a density in the Bickel-Breiman theorem is a 
more restrictive assumption than it looks. For example, it precludes the pos- 
sibility that the random variables are supported on some lower dimensional 
manifold, which may be quite important from a practical point of view. 

In another widely cited work, Avram and Bertsimas [2] combined the 
Bickel-Breiman approach with the dependency graph technique of Baldi 
and Rinott [3] to yield CLTs for sums of edge lengths in various graphs 
arising from geometrical probability. A different method, originating from 
the work of Kesten and Lee [29], was used by Penrose and Yukich [32] to 
obtain a general CLT (with Kolmogorov distance error bound) for certain 
translation invariant functionals of uniformly distributed points and Poisson 
processes. 

We have the following generalization of the Bickel-Breiman result, which, 
among other things, does away with the assumption that the Xj's have a 
density with respect to Lebesgue measure. We also have an error bound, 
explicit up to a universal constant. 

Theorem 3.4. Fix n > 4, d> 1, and k >l. Suppose Xi, . .. ,Xn are 
i.i.d. M.'^ -valued random vectors with the property that \\Xi — X2II is a con- 
tinuous random variable. Let f : (M'^)" — > M 6e a function of the form 

1 " 

(5) f{xi,...,Xn) = ^ 

where, for each £, fe{xi, . . . ,Xn) is a function of only xi and its k nearest 
neighbors. Suppose, for some p>8, that 7p := maxiE\fi{Xi, . . . ,Xn)\^ is 
finite. Let W = f{Xi, . . . ,Xn) and o"^ = Yai[{W). We then have the bound 

'^W ^2„(p-8)/2p + ^ cj3„(p-6)/2p ' 

where 6w "is the distance to normality defined in Definition 2.1, a{d) is the 
minimum number of 60° cones at the origin required to cover R'^ and C is 
a universal constant. 

Remarks, (i) The assumption that the distribution of — X2II does 
not have point masses is the bare minimal condition required to guarantee 
that the pairwise distances are all different (so that the nearest-neighbor or- 
derings are uniquely defined). We believe that it is impossible to employ the 
big-blocks-small-blocks method under this minimal assumption, although it 
may be possible to formulate a version of the method that works when the 
XiS are supported on a sufficiently nice manifold. 
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(ii) The assumption concerning the /^'s is also very weak. Unhke the 
Bickel-Breiman theorem, we do not require boundedness or continuity. More- 
over, we do not even assume that the f^s are functions of only nearest- 
neighbor distances — they can be arbitrary functions of the nearest neigh- 
bors. 

(iii) Like Theorem 2.2, the above result suffers from the deficiency that 
it does not say anything about cr^. Again, as before, we think of that as a 
separate problem. 

Some applications, (i) Vertex degree in a geometric graph. For a fixed 
e > and a given collection of points x = (xi, . . . ,Xn) in W^, the geometric 
graph G(x,e) is the graph on x that puts edges between all pairs of vertices 
that are < e distance apart. Replacing x by a collection X = {Xi, . . . , X„) of 
i.i.d. random vectors, let be the number of points having vertex degree 
at least k (where k is fixed). This problem can be put in the context of 
Theorem 3.4 by defining fi{x) = 1 if the distance between xi and its kth 
nearest neighbor is < e, and ft{x) = otherwise. Then, Nf. = Z)"=i fii^)- 
Suppose all other terms are defined as in the statement of Theorem 3.4. 
Clearly, 7p < 1 for all p > 1. Hence, we can take oo and get 



where cr^ = Var(A'^fc), 6n^ is the distance to normality defined in Defini- 
tion 2.1 and C is a constant depending on dimension d and the distribution 
of the Xg^s. If e grows with n at such a rate that o"^ does not collapse to zero, 
then we get an 0(n~^/^) error bound for the Wasserstein distance. Inciden- 
tally, this example is quite well understood (see, e.g.. Chapter 4 of [31]). 

(ii) Average nearest-neighbor distance. Suppose Xi,. . . ,Xn are i.i.d. ran- 
dom vectors in W^. Let be the distance of X^ to its nearest-neighbor and 
D = ^ X]"=i average nearest-neighbor distance. Assume that the 

support of the distribution of the Xj's is m-dimensional, in the sense that 
the mass of e-balls around any point is x as e ^ 0. Although a CLT for 
D could be proven using the Bickel-Breiman result if the Xj's had a density 
with respect to Lebesgue measure, it does not work if we only assume that 
— A'211 has a continuous distribution. 



Let fe = n^^"^Di and f = n fi. Then, for all e > 0, we clearly have 



It follows that there is a constant L >1 such that 7p < Lp for all p > 1. 
Along the same lines, it is not difficult to show that cr^ := Var(/(X)) x 1 as 
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n — > oo. Taking p = logn, we get the bound 

C(logn)3 
Od<- 



n 



where 5d is the distance to normahty defined in Definition 2.1 and C is a 
constant depending on the dimension d and the distribution of the X^s. 

(iii) The Levina-Bickel statistic. In the preceding examples, we see that 
the error bound is effectively 0{n~^^^) when the summands have light tails. 
However, the fe's may be heavy-tailed in applications. A specific example of 
such a function is the recent "dimension estimator" of Levina and Bickel [30] 
which uses the distances to the first k nearest neighbors to obtain an estimate 
of the so-called intrinsic dimension of a statistical data cloud. Explicitly, if 
Xi, . . . ,Xn are i.i.d. random variables lying on a nice manifold of unknown 
dimension m embedded in a higher-dimensional space W^, and /c is a positive 
integer > 2, then the Levina-Bickel estimate of m with tuning parameter k 
is given by the formula 

(6) ihk = 

where Dej is the distance between X£ and its jth nearest neighbor. In (6), 
we have fe{x) = {k — l)/gi{x), where 

and Dij[x) is the distance between xi and its jth nearest neighbor in the 
collection x = (xi, . . . , It is argued in [30] that for large n, under appro- 
priate assumptions, the distribution of m ■ gi{X) can be approximated by 
the Gamma(/c, 1) distribution (recall that m is the dimension of the manifold 
on which the data lie). It follows that 




where C is a constant that does not depend on k, n and m. Putting p = k- 
in Theorem 3.4, we get 

a{dYk^m?'{ka + m) 

Orh., < C - 



-m,. _ - ^3„(fc-9)/(2fc~2) ' 

where o"^ = Var(y^(mfc — Em/^.)) and is the distance to normality defined 
in Definition 2.1. Levina and Bickel ([30], Section 3) claim that for fixed A;, 
they have a proof that o"^ x 1 as n — > oo. This, combined with the above 
bound, implies a CLT for the Levina-Bickel statistic for A; > 9. 
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Proof of Theorem 3.4. For each x = {xi,...,Xn) G (M'^)", define a 
function dx on [n] x [n] as 

(7) dx{i,3) = - xiW < \\xi - Xj\\}. 

Our first task is to identify an interaction rule for functions of the form (5). 
Suppose A; is a fixed positive integer. Given any x G (M'^)", let G{x) be the 
graph on [n] that puts an edge between i and j if and only if there exists 
an £ such that dx{i,i) < A: + 1 and dx{i,j) <k + l. We claim that G is a 
symmetric interaction rule for /. 

To prove this claim, we begin with a simple observation: if x, x' G (M'^)" 
and £,m G [n] are such that X£ = x'^ and Xm = x'^, then 

(8) \dx{£,m)-dx:{£,m)\<#{r:xr^x'^}. 

Now, fix some x,x' G (M'^)" and i,j G [n], where i ^j. Define x^, x^ and 
x^^ as in the definition of interaction between coordinates in Section 2.3. 
Suppose {i,j} is not an edge in G{x), G{x^) and G{x^^). We will 

show that for every £, 

(9) fi{x) - Mx^) - fi{x') + hix'^) = 0. 
So, let us fix some £ £ [n]. First, suppose that 

(10) dx{£,j)<k. 
We claim that in this situation, 

(11) h{x)=h{x') and fe{x^) = fdx''). 
To show that, first note that since {i,j} ^ G{x), we have 

(12) dxi£,i)>k + l. 

In particular, i is different from £ and j. Thus, using (8) and (10), we see that 
dxi{£,j) < k + 1. Combining this with the hypothesis that {i,j} ^ G(x*), we 
get 

(13) d^,{£,i)>k + l. 

From (12) and (13), it is easy to deduce that xi has the same set of k nearest 
neighbors in both x and x*, hence that fe{x) = fe{x^). 

Next, still assuming (10), suppose that dxi{£,i) < k. We show that this is 
impossible by considering two cases: (i) if j = £, this is clearly false because 
{i,j} ^ G{x^); (ii) if j / £, then by (8) and (12), we have 4i(^,i) > A: + 1. 
Thus, under (10), we must have 

(14) dxji£,i)>k + l. 

Finally, still under (10), suppose we have dxtj{£,i) < k. Again, we show that 
this cannot be true under (10) by considering two cases: (i) if £ = j, this 
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cannot hold because {i,j} ^ G(x'-'); (ii) if j, then from (8) and (13), we 
get dxij{i,i) > A: + 1. Thus, under (10), we have 

(15) d^,,ii,i)>k + l. 

From (14) and (15), it follows that X£ has the same set of k nearest neighbors 
in and x^^ . Therefore, fg{x^) = /^(a;*-'). This completes the proof of (11) 
under the hypothesis (10). 

The symmetry in the problem now implies that (11) holds if d^j{i,j) < k 
or d^i{i,j) < k, dr^ij{£,j) < k. If none of these are true [i.e., dz{i,j) > k 
for *-^], then we can directly deduce that the set of k nearest 

neighbors of x^ is the same in x and x^ and (separately) also in x* and x^^ , 
therefore 

f,{x)=Mx^) and h{x') = Mx'^). 

Combining the cases, the proof of (9) is now complete. 

Thus, we have proven the claim that G is an interaction rule for /. Clearly, 
G is symmetric. A symmetric extension of G to (R'^)""^^ is easily constructed 
as follows. Given any vector x € (M'^)"^^, let G'{x) be the graph on [n + 4] 
that puts an edge between i and j if and only if there exists an i € [n + 4] 
such that dx{i,i) <k + 5 and dx{i,j) < /c + 5. To see this, note that if G 
G{xi, . . . ,Xn), then there exists some i such that Xj and Xj are both among 
the k + 1 nearest neighbors of X£ in the set {xi, . . . ,Xn}- After the addition 
of four more points to this set, Xj and xj will still be members of the set of 
k + 5 nearest neighbors of X£. This proves that G' is an extension of G, and 
it is obviously symmetric. 

Now, for every x G R'^ and 1 < j < let 

Nj{x):={£:dx{iJ)<k}. 

As we have noted before, if ^ ^ A'j (x) U Nj{x^), then X£ has the same set of 
k nearest neighbors in both x and x^ , therefore fi{x) = /^(x-'). Thus, 

fix) - f{x^) = n-'^'ifii^) - M^'))- 

eeNj(x)uNj{x3) 

It follows from standard geometrical arguments (see, e.g., [42], page 102) and 
the assumption that \\Xi — X2II is a continuous r.v. that |A''j(x) U A''j(x'')| < 
2a{d)k, irrespective of n and x, where a{d) is the minimum number of 60° 
cones at the origin required to cover W^. Thus, if we let 

Mf :=max|/,(X)| Vmax|/,(X^')|, 
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then the random variable M in the statement of Theorem 2.5 can be bounded 
by An~^/'^a{d)kM f in this problem. Next, note that for any p > 8, 



E(M|) < [E(Mp]^ 



< 



Similarly, one can show that E|Aj/(X)|3 < Ca{dfk^n-^/^{njp)'^/P. Finally, 
note that by the same geometrical observation as mentioned before, the 
maximum degree of G'{X) is bounded by a{d){k + l){k + 5). The proof 
is now completed by combining the bounds for all the terms and using 
Theorem 2.5. □ 



4. Proofs of the main results. 



4.1. Proof of Theorem 2.2. Let us begin with the observation that, with- 
out loss of generality, we can replace / by o""^/ and then assume that = 1. 
Henceforth, we will work under that assumption. The argument is divided 
into a sequence of lemmas. Lemmas 2.3 and 2.4 (already stated in Section 2) 
and Lemma 4.1 are original contributions of this paper, while Lemma 4.2 
goes back to Stein [41]. 



Proof of Lemma 2.3. Consider the sum 

Clearly, this is a linear combination of {f{X^\A C [n]}. It is a matter of 
simple verification that the positive and negative coefficients of f{X^^ in 
this linear combination cancel out except when A = [n] or A = 0. In fact, 
the above expression is identically equal to f{X) — f{X'). 

Now, fix A and j ^ A, and let U = g{X)Ajf {X^). U is then a function 
of the random vectors X and X' . The joint distribution of {X,X') remains 
unchanged if we interchange Xj and X'j . Under this operation, U changes 
to U' := -g{X^)Ajf{X^). Thus, 

E(C/) = E(C/') = iE(C/ + U') = ^E[AjgiX)A,fiX^)]. 

Combining these observations, we get 

Cov(5(X), /(X)) = E[g{X){f{X) - fix'))] 



18 S. CHATTER JEE 

This completes the proof of the lemma. □ 

Proof of Lemma 2.4. For each A C [n] and j ^ A, let 

and 

i?A,, := ^'(/(X))A,/(X)A,/(X^). 
By Lemma 2.3 with g = ip o f, we have 

(16) E((^(iy)W^) = i 5: .n.J |,|. E 

"'acM y\A\)Kn-\A\) .^^ 



By the mean value theorem, we have 
(17) 



E\Ra,, - Ra,,\ < ^5^E|(A,/(X))2A,/(X^) 



11 //I I 

< "^^"°° E|Aj/(X)|^ (by HSlder's inequality). 
Now, from the definition of T, we have 

AC[n] M^K ^ ' j^A 

Combining (16), (17) and (18), we get 
\E{ip{W)W) -E{ip'{W)T)\ 

\ E '_|^|. E^(-^A,-^A,,) 
^AC[n]MA|K™ l^l^-^A 

<^E (nwJ_|^|. EWWI^ 
* AC[„,] MA|A" 1^1^ j^A 

II "II " 

-""'^ ""°^.^E|A,/(X)|3. 



This completes the proof of the lemma. □ 
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Lemma 4.1. Let W he as above. For any ip G C^(]R) such that Hv^'Hoo < 1 
and \\ip"\\oo < 2, we have 

n 

\E{^{W)W)-E{ip'{W))\ < [Var(E(r|Ty))]i/2+ 1 ^E|Aj/(X)|l 

i=i 

Proof. Note that by putting g = f in Lemma 2.3, we get E(T) = 
EiW"^) = 1. Since \\ip'\\oo < 1, this gives 

\K{ip'{W)T) - E{ip'{W))\ < E\E{T\W) - 1| < [Yav{E{T\W))]^/^ . 

The proof is completed by applying Lemma 2.4. □ 

Lemma 4.2. Suppose h -.IB. is an absolutely continuous function with 
bounded derivative. Let Z~ A^(0,1). There then exists a solution to the dif- 
ferential equation 

ip'{x) - x^p{x) = h{x) - Eh{Z) 
that satisfies W^p'Woo ^ \/ ~ll^'l|oo o-iT'd \\^"\\oo ^ 2||/i'||oo- 



Remark. It is not difficult to show that both constants are sharp. For 
a different proof of the bound on ||(^'||qq, see Lemma 1 in [33]. The bound 
on ||(/7"||oo is due to Stein ([41], page 27). Easier proofs with suboptimal 
constants can be found in Chen and Shao ([14], Chapter 1, Lemma 2.3). 

Proof of Lemma 4.2. It can be verified that the function 

(^(x) = e^'/2 r e-''/\h{t)-Eh{Z))dt 



oo 

2 



-e 



x2/2 / e-' /\h{t)-Eh{Z))dt 



X 



is a solution. Stein ([41], page 25, Lemma 3) proves that [[(^"Hoo < 2||/i'||oo- 
The inequality [[(^'Hoo oo can also be derived using Stein's proof of the 

other inequahty. We carry out the steps below. First, it is easy to verify that 

p p OO 

h{x)-Eh{Z)= h'{z)^{z)dz- h'{z){l-^{z))dz, 



where <I> is the standard Gaussian c.d.f. Again, as proven in Stein ([41], 
page 27), 

(/p(x) = -\/2^e^'/2(i _ r h'{z)^{z)dz 

J —oo 

„ pea 

- \/2^e^ /2^(x) / h'{z){l-^z))dz. 

J X 
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Combining, we see that 

ip'{x) = xip{x) + h{x) - m{Z) 

= {l-^/2^xe'^''^{l-^{x))) r h'{z)^{z)dz 

roo 

- (1 + V2^xe^ / h'{z)il - $(z)) dz. 

J X 

It follows that 

||<^'||oo< ||/i'||ooSUpf|l- V2^xe^'/2(^_^(^))| r 

+ |1 + ^2^X6^ / (1 - $(z)) 

Jx 

Using integration by parts, we get 

/X 
^{z) dz = x^{x) + 
-oo 



dz]. 



and 

e" 



{l-^{z))dz = -x{l-^{x))+ 
Thus, we have 

/ / -x2/2\ 

||¥''lloo< ||/i'||oosup Ml- V2^xe^ - ^{x))\[x^x) + 

/ —x^ /2 \ \ 

+ |1 + ^/2^xe^'/2^(x)| - $(x)) + ^^j^j j • 

It is a calculus exercise to verify that the term inside the brackets attains 
its maximum at x = 0, where its value is □ 

Proof of Theorem 2.2. Take any h with ||/i'||oo < 1- Let 99 be a 
solution to '^'{x) — xip{x) = h[x) — E,h{Z). Then, 

Eh{W) - Eh{Z) = E{ip'{W)) - E{Wip{W)). 
By Lemma 4.2, H^^'Hoo 

< 1 and II '/'"1 1 00 ^ 2. The proof is completed by 

applying Lemma 4.1. □ 
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4.2. Proof of Theorem 2.5. By Theorem 2.2, our task reduces to ob- 
taining a bound on Var(E(r|X)), where T is defined in (1). However, the 
situation in Theorem 2.5 is too complex to admit a direct computation of the 
variance. To circumvent this problem, we will use the following well-known 
martingale bound for the variance of an arbitrary function of independent 
random variables. This is known as the Efron-Stein inequality in the statis- 
tics literature. 

Lemma 4.3 ([17, 39]). Let Z = g{Yi, . . . ,Ym) be a function of inde- 
pendent random objects Yi, . . . ,Ym.. Let Y- be an independent copy of Yi, 
i = 1, . . . ,m. Then, 

m 

Var(Z) < 1 ^£[(5(^1,. . . , ^i-i, F/, l^.+i, • • • , Y^) - 9^ , • ■ • , Y^)f] . 
1=1 

We will combine this inequality with another simple inequality that we 
were unable to locate in the literature. 

Lemma 4.4. If X and X' are independent random objects, then for any 
square integrable function U = g{X, X') , we have the inequality 

Var(E(C/|X)) <E(Var(C/|X')). 

Proof. The proof is based on a simple application of Jensen's inequal- 
ity. We just note that by the independence of X and X' , we have E(E(?7|X')|X) = 
E(J7) and therefore 

Var(E(C/|X)) = E(E(C/|X) - E{U)f 

= E(E(C/-E(C/|X')|X))^ 

< E(C/ - E{U\X')f = E(Var(C/|X')). 

This completes the proof of the lemma. □ 

Now, recall the definitions of Aj, Ta and T from Section 2, and the 
normal approximation bound in terms of Var(E(r|X)) in Theorem 2.2. We 
will prove the following upper bound on Var(E(r^|X)). 

Lemma 4.5. With everything defined as before, we have 

Var(E(rA|X)) < CE{M^)^/^K{6^)^/'^ ^n{n - \A\), 
where C is a universal constant. 
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This is a good place to declare the convention that throughout the remain- 
der of this section, C will denote numerical constants that do not depend 
on anything else and the value of C may change from line to line. 

Before proving Lemma 4.5, we need to finish an important task. 

Proof of Theorem 2.2. Lemma 4.5, combined with Theorem 2.2, 
completes the proof of Theorem 2.5 as follows. First, note that by the defi- 
nition (1) of T and Minkowski's inequality, we have 

Substituting the bound from Lemma 4.5 into the above expression, we get 
[Var(E(r|X))]V2<CE(M8)V2]E(54)V2 ;^ "'/!^.^ " '^'^'f 

n 
k=l 

This completes the proof of Theorem 2.5. □ 

Our main job now is to prove Lemma 4.5. Let us begin with a simple 
lemma about symmetric graphical rules. 

Lemma 4.6. Suppose G is a symmetric graphical rule on X"^ and X = 
(Xi, . . . , X„) is a vector of independent and identically distributed X- 
valued random variables. Let di be the degree of the vertex 1 in G{X) . Take 
any k < n — 1 and let i,ii,i2, ■ ■ ■ ,ik be any collection of k + 1 distinct elements 
of [n] . Then, 

(19) P({i, v} G G{X) for each l<e<k) = ^ii^lM, 

(n - Ijfc 

where (r)^ stands for the product r{r — 1) • • • (r — A; + 1). 

Proof. Since G is a symmetric rule and the Xj's are i.i.d., the quantity 
P({i, ie} £ G{X) for all 1 < £ < fc) 
does not depend on the specific choice of i,ii, . . . ,ik- Hence, 
P({i,i£} G G{X) for each 1 < i < k) 

= 7 — 5— yP({i,j£} gG(X) for each l<£<k), 
[n - l)k ^ 
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where the sum is taken over ah choices of distinct ji, ■ ■ ■ ,jk in -^i" 
nally, note that 

e G{X) for each 1 < ^ < A;} = idi)k, 

where di is the degree of the vertex i. Again, by symmetry, di and di have 
the same distribution. This completes the argument. □ 

Proof of Lemma 4.5. Fix a set A C [n]. For each j ^ A, let 

R,=A,f{X)A,f{X^) 

= (/(X)-/(X^))(/(X^)-/(X^u^)). 

Now, let Y = (Yi, . . . , Yn) be another copy of X , independent of both X and 
X'. Fix 1 < i < n. Let 

X = {Xi, . . . , Xi-i,Yi, Xi^i, . . . , Xn)- 

Similarly, for each B C [n], define X^ by replacing Xi with Yi in X^ . Ex- 
plicitly, if B, then 

^ — V^l 1 • • • • • • >^n A 

whereas if i G then X^ = X^ . With this notation, let 
Rj, = ifiX) - f{X^)){f{X^) - fiX^^n), 

and put 

It follows from a combination of Lemmas 4.3 and 4.4 that 

n 

(20) Var(E{TA\X)) < E{\ar{TA\X')) < ^ ^ /ij. 

1=1 

Let us now proceed to bound hi. First, take some j ^ AUi and let 

4 = I{{z,j}gG(X)}, 
d% = I{{i,j}eG{X^)}, 
4 = I{{i,j}GG(X)} 

and 

d% = l{{i,j}eGiX^)}. 




24 S. CHATTER JEE 

Now, suppose that for a particular realization, we have djj = = dj^ = 
= 0. Since G is an interaction rule for /, this event implies that 

f{X)-f{X=) = f{X)-f{X^). 

If we now take X^ instead of X and X^ instead of X, and define ejj, e^j, 
e|j and ejj analogously, then the event ejj = e|j = e|j = e^j = implies that 

irrespective of whether or not i £ A. Now, let 

L, :=max|A,/(X)A,-/(X^) - A,/(X)A,-/(X^)|. 

From the preceding observations, we see that for j ^ ^ U i, 

4 

|i?i-i?,.i<iiE(4+4)- 

k=l 

When i ^ A and j = i,we simply have \Rj — Rji\ < L^. Applying the Cauchy- 
Schwarz inequality, we now get 

r / 4 X^'V/^ 

(21) h,< nLtmi{i^A}+ y: E(4+4) 

V j^Auik=l / . 

Now, by the inequality {J2i=i '^i)'^ ^ J2i=i 'A^ have 

V j^ylUiA:=l / 

fc=l Vj^Aui / fc=l Vj^Aui / 

To get a bound for the above terms, first consider the d} term. It follows 
directly from Lemma 4.6 that for any j,k,l and m, 

.1.1 .1.1 



"^{djidkidlidmi) < C- 

■' n' 

where r = the number of distinct indices among j, A;, /, m and b\ is the degree 
of the vertex 1 in G{X\ Recall the definition of b from the statement of the 
theorem and observe that J > Ji + 1. It is now easy to deduce that 



e4^^-|^l 



n 
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Now, consider the problem of bounding 'E{d'j^dl.■df^d'^.■). First, suppose j, k, 
I and m are distinct. Let X be the random vector on X'^'^^ defined as 
X := {Xi, . . . , Xn, X'j , X'j^, X'l , X'^) . 

Note that if d^j = dl^ = df^ = = 1, then {i,n + 1}, {i,n + 2}, {i,n + 3} and 

{i,n + 4} are all edges in the extended graph G'{X). Since G' is a symmetric 
rule and the components of X are i.i.d., it again follows from Lemma 4.6 
that 

Now, suppose j,k,l are distinct, but m = l. Let s be an element of [n] 
different from j, k and /. Define 

X := (Xi,. . . , Xn,Xj,X'j.,X'i, X'g) 

and proceed as before to conclude that, in this case, 

-,2 ^2 j2 j2 \ _ TP/ j2 j2 j2 \ ^ 



4,n-\A\ 



n 



EidjA^4dL)=nd'JA^dti) < C- 
In general, if r is the number of distinct elements among j,k,l,m, then 

E{d%dldldl,)<C^. 

From this, we get 

The d^,e^ and terms can be given the same bound as the d^ term, while 
the d^, and terms are similar to the d^ term. Combining, we get 

E ^ ^} + E E (4 + 4)) ^ ^ ^} + ^^^j^ 

It is easy to show, using the Cauchy-Schwarz inequality, that E(L^) < 
CE(M^), where M = maxj |Aj/(X)|. Using these bounds in (21) and the 
inequality ^Jx + y < \fx + y^, we get 



Substituting this bound in (20), we get 



Var(E(rA|X)) < CE{M^)^/^E{5^)^/\n - \A\ + ^n{n-\A\)) 
< CE{M^)^/'^E{5^)^/'^^n{n - \A\). 



This completes the proof. □ 
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